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Introduction 



Tins document describes a high level description of she Mucho Grande Architecture. It gives br.ef 
description of ciups used as well as brief overview of functions within the chips. It also describes 
interfaces between major crips in the architecture. Further, it describes encoding schemes in the 
architecture as well as possible error condition in the arch. Lastly, it talks about flow control 
mechanism in the arch. 




Serial link technology is used to send data across the backplane. Each serial link is capable of sending 2.5 
Gbps of data who's baud rate is really 3.125 Gbps. Each serial link IP core offers a mechanism where by 4 
serial links are group to create 1 10 Gbps pipe. The architecture manages to combine five 10G pipes into 
one 50G pipe, Thus the total switching capacity across the backplane is 50G x 8 x 2 =800 Gbps. 



Having a 50G pipe to the backplane allows us to support 4 I0G Ethernet traffic across ail packet sizes at 
line fate or support 2 OC-192C at line rate or even one OC-768C. 



As stated eariher, there is one SBIA per blade. This chip allows for combining 4 independent 10 G pipes 
into 1 SOG pipe and for different traffic from the backplane to one of the four 10 G pipe. Further each 
SBIA is capable of supporting eight logical 4 Gbps pipes thai is multiplexed across IBT chip. IBT acts as 
bridge between exisiting Biglron ASIC (IPC/iGC) that make use of parallel I/O it allows for converting 
parallel to serial and serial to parallel. The following diagram illustrates different blade configurations. 
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The above diagram tries to illustrate where different pieces would fit into the architecture using the basic 
Mucho chipset that consists of SBIA, SXPNT and IBT. In the case of OC-768C, there would be no need 
for SBIA. Although OC-192C shows taking up two 10G pipes, it could be made to use one. In this case, 
there would be performance impact to small packet sizes. 



3 Super Backplane interface Adapter (SBIA) 



SBIA allows for a mechanism where by either four 1 0 pipes or eight 4/5 G pipes can be combined into !. 50 
G pipe across the backplane. The ASIC can be configured to treat traffic from within a blade as either foui 
10 G pipes or as eight 5 gig pipes. The ASIC also performs local switching between the 4/8 local ports, 
On the traffic coming from the backplane, the ASIC syntonizes data coming from 5 independent XPNT to 
create one 50 G line. It then goes through FID lookup to direct the packet to appropriate port. In the case 
of raafec&st tfisffie, the ASIC directs traffse to multiple ports based upon result obtained from FID 



SXPNT is a 8-port crosspoint chip. It receives eight I0G streams of data with information to direct traffic 
to appropriate port within the data stream. The switch size is defined to be I cell. Each cell is defined to 
be either 8, 28, 48. 68, 88, 108, 128, or 148 bytes. Each SXPNT receives 1/5 of the data, in other words, 
1/5 of the ceil. Each cell is encoded in such a manner as to allow for ail 5 SXPNT to operate indepentdly 
and for ah SXPNT to receive destination slot number. Further details of backplane cell encoding will be 
< ► -^ribv. 1 1 i L'er >ec 5-< Each SXPNT is capable of 10 x 8 x 2 = 160 Gbps switching capacity. 



IB I ASIC acts as a bridge between EPC.TGC ASIC developed for Biglron Arch and Mucho Architecture. 
More genencally, it allows for a mechanism to translate two 4/5 gig parallel stream into one 10G serial 

r_<.r Ixfci ertan. bviween SBIA and IBT will be described in later section. The parallel interface is 
trie backplane interface of the IPC/IGC ASIC's. Additionally, IBT can be configured to work in the 
Biglron Arcnitecture mode where a 10G serial design can be translated to Biglron 8G backplane interface 
and vise versa. 



4 Super Crosspoint (SXPNT) 



5 IPC Bus Translator (IBT) 



6 Transmission & Encoding Scheme for Mucho Grande 
Backplane 



6.1 Introduction 

This section describes a novel encoding scheme for the Mucho Grande backplane. The encoding requires 
use of 2 special K characters to communicate centre-! information. The data sent from SBIA to sXPNT on 

5 I OG stripes, and from the sXPNT to SBIA on 5 1 OG trunks » 



6.2 sBIA to sXPNT 

The maximum size of a payload for transfer in the backplane is 160 bytes (148 bytes of data max, iO bytes 
of "Start of Cell" (SOC) control information, and 2 bytes reserved. A complete 160 byte transfer, for the 
purpose of this document, is referred to as a "cell". A cycle is a single 3.2ns clock pulse (ie. 312.5 Mhz). 
A cell transfer is accomplished (as shown: below) in 20 byte "blocks", in 8 consecutive cycles. 
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6.2.1 Start of Cell 

In the first block transmitted of a given cell, only 8 bytes are actually used for data. Of the remaining 12, 
1 0 are used for control information, and 2 are reserved for debug purposes. 

The K0 character is used by the sXPNT to recognize that a new cell has started, and that a state change may 
have occurred. The state byte is used by the sXPNT to determine what the destination slot for a given cell. 



The "state" byte is assigned as follows: 



Field 


Name 


Description 


State[3:0] 


SlotNumber 


Destination slot number for sBIA to xXPNT and Source Slot Number for 
sXPNT to sBLA. sBIA will send IDLE packets to slot 7 


State [5:4] 


PayloadState 


Encode payload state: 

00 - RESERVED 

01 - SOP 
10 -DATA 



1 1 - ABOR" 



In the ease of transroission from sBIA to sXPNT, the value of "state" will be identical on all 5 stripes. 



6,2,2 End of Packet 

The scheme transfers 160 byte cells, whenever there are 148 bytes of data available to transfer. In the case 
where fee ; are less thaa 148 bytes of data, then only as many 20 byte blocks as necessary are used, A 
special EQP KL character is used to indicate an end of packet. There are 4 conditions that are of interest to 
us, and each is illustrated below: 



1. EQP during cycle 1 (ie. during transmission of state information) 



I i]K0 fstatsfoo jpt ||ko [state }o2 |03 |[ko jstate |K1 JkI jjKP [state )ki |K1 ||ko jsiate[R£SjF 

Note that the K,0, state, and Reserved bytes are all preserved, as m any other cycle 1 transmissioa The Kl 
character is treated as data. 

2. EOP during cycle n (n != 0) 
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3. EOP at block boundary during cycle n (n != 8) 
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No?e that when a > 0, the block boundary for data is in lane 3 of stripe 5. However, for n = 0, the block 
boundary for data is in lane 3 of stripe 4. 

4, bOP a! ceii boundary 



[kl \\r,C [stale jK1 JKI [|K0 jstalajKI jkl' ||ko |"state|RES j! 
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IN K 



Note that a ceil need not be composed of 8 blocks, and can potentially be just one. The other exception to 
this rule is wfeea- aa abort is send. ABORT is ased during errornous condition in which case a cell could be 
less than 160 bytes and still not have an EOP. The very next cell for that destination will contain an 
ABORT cell. 



8,2.3 Operation Concerns 

Payload transmission size and method were both devised for very specific reasons. 



6.2.3.1 Data Rate 



In determining the ideal number of blocks in a cell, there are a number of constraints to consider. The 
sBIA design is optimized for transfers that are a multiple of 2 cycles long. Another constraint says that we 
would like to send as many pure "data" blocks per cell as possible, so as to mitigate the overhead lost to the 
cycle 1 block. Another constraint says that we would like to minimize the number of blocks per cell so as 
to more quickly arbitrate between the various devices connected to the sXPNT. Finally, the third, and most 
important, constraint says that we must run at line rate. A cell size of 6 blocks would only transmit 108 
bytes of data, which causes the backplane data rate to drop below line rate for Ethernet packets of size 124 
bytes. The next largest cell size available to us is 8 blocks. This leaves us with 7.5% of bandwidth lost to 
overhead, and allows for rapid arbitration between the various sBIA devices. 



6.23.2 Reserved bytes 

The cycle 1 block gives up half of its bandwidth to control information - a necessity for functionality. 
However, in stripe 5, we also "reserve" the 2 data bytes that were left, as well. This was done to simplify 
transfers from the existing packet processors. Both IGC and IPC are designed to transfer data in 4 byte 
increments. Therefore, to transmit and entire PPs payload, a multiple of 4 bytes must be available for data, 
or o.-ly a fraction of the payload will be sent. By reserving the 2 bytes of block 1, leave the number of 
available bytes as 8, and avoid any byte rotations. This is a non-issue when all 20 bytes are available for 
da«i 



6.3 sXPNT to sB\A 

The constraints on the payload size for transfers from She xXPNT to the sBIA are identical to those for 
transfers in the opposing direction. However, there are major differences in the operation of the individual 
stripes, 



6 3.1 Cell Boundry Alignment 

In the case of transfers from the sBIA to the sXPNT, all of the KO characters were aligned within the same 
block. This was a simple task to accomplish as a single device was driving all five stripes. However, in 
this case, five devices are each driving a single stripe. While we are guaranteed to receive a KO character at 
!ea<.t every eight blocks for any given stripe, we have no guarantees regarding the alignment of the stripes 
i iem wives. 



L0 L1 L2 L3 




7 IVlucho's Error Handling Capabilities 



This section will describe possible error conditions that might occur with using serial link technology in the 
backplane and possible prevention care as well as recovery mechanism for it. It will also describe reset 
procedure to syne up different blades as well walk through events during hot swapping of blades. 



7,1 Error Description 

There are 3 types of errors that can occur in the backplane. 



1. 



Link Error. This occurs as a result of a bit error or a byte alignment problem within a serdes. 
Since the clock is recovered from the data stream, there exists a possibility of a byte 



alignment problem if there isn't enough data transition. Bit error can also occur as a result of 
external noise on to the line. The serdes also detected exception condition such as SOP 
character in lane 1 and marks them as link errors. The. probability of either occurring is in the 
order of 1 failure per !0 AA !3 data bits transmitted, 

2, Lane Synchronization Error, Lane is defined as 1 serial link among the 4 serial links that 
make up the 1 0 gig serdes. There exists a 4 deep fifo within the serdes cores to compensate 
for any possible transmission line skew and synchronize them as to present a unified 30 gig 
stream to the core logic. There are possible cases where the fifo's might overflow/underflow, 
which could result in lane synchronization error. There also exist cases when a lane 
synchronization sequence might determine a possible alignment problem. The probability of 
this occurring is unknown at 'his lime. Broadcom is in the process of characterizing !he 
serdes to determine the error rate. 

3. Stripe Synchronization Error. This is as a result of our architecture in winch date is sent as 
i 50 gig pipe to the backplane but is striped across 5 independent XP'NT chips arbitsating 
indepently. The receiving BIA contains multiple 64 deep fifo's (156.25 Mhz) that is sorted 
according sending source and stripe. Once Ike fill across all stripes than a whole block "20 
bytes" are read. There exist cases when 1 of the fifo's mighi overflow as a result of 1 of the 
above conditions or some other unknown case where the stripes might be completely out of 
sync. The probability of this is unknown at Shis time. 



7.2 Detection and Preventive Care 

1 . Link Error. Once a link error is detected by the serdes, it passes an IE character through the 
parallel interface. This is detected by way of 8/ 10 decoder decoding an invalid character or a 
link being lost. After reset, the serdes go through a training sequence which allows the 
receiver to adjust it's sampling point to the middle of a bit time and through transition density 
offered through 8/10 encoding, the receiver in theory stays in sync. If however, the receiver's 
sampling point gets out of sync then at some point an invalid code is detected and passed 
through the parallel interface. The way to prevent the sampling point from going out of sync 
is to send random number IK and IR characters through the stream during idle operation. 
Tnese K characters are special characters that allow the serdes to stay in sync by adjusting its 
sampling point. In the Mucho implementation, any time the link becomes idle than either a 
IK or IR. will be transmitted. A pseudo random number generation will determine in selection 
of IK and IR characters. 

2. Lane Synchronization Error. There are 2 ways that the serdes detect a possible lane 
alignment problem. "Tins is detected when the fifo overflows/underflows. In this case, a 
signal is send from the serdes to the core logic when this occurs. The second failure is when 
serdes detect a failure when a lane synchronization sequence is sent across ail lanes. In this 
case, the point at which data seen by the core going out of sequence is not known. The lane 
synchrcrurction consists of send /A, fK, and /R characters send across all lanes 
simulaiiounsly. Prior to lane synchronization, the serdes require finite number of random IK, 
IR characters to insure that the byte alignment exists. The serdes check to see if all these I A, 
IK, IR come across all lanes at once. Failure to detect ail I A, IK, IR, characters across all lanes 
not aligned is considered a lane synchronization error. The way to prevent the ianes from 
going out of sync is to periodically sent out I A, IK, IR sequence. In the Mucho 
implementation, there will be a programmable 32-bit counter that will run at 156.25 Mhz, that 
when it overflows it will send out the lane synchronization sequence. Further, there will be 
programmable padding register, 7-bit register, that will determine the number of random IK, 
IR characters to pad prior to and after I A, IK, IR sequence. We will work with Broadcom to 
determine what the production value of both of these registers values will be once the 
backplane is characterized. 



3. Stripe Synchronization. There are couple of ways to detect possible stripe synchronization 
problem. First, if an invalid pattern is found across a block (block is defined a 5 stripes), An 
example of this would be detection of KG pattern in lane 0 of 5 stripes and not across i. This 
pattern would never be sent across the backplane and detection of such pattern is considered 
to be a stripe synchronization error. The other case is during stripe synchronization, if there 
are any entries left in the queue. Stripe synchronization will consist of sending a K2 character 
64 times across all lanes and all stripes after which all stripes of the sync queues will write 
data to known location thus guaranteeing stripe synchronization. The number 64 is chosen 
since it will match the depth of the sync queues, thus, if there is any data left in the queue 
after the final K2 character is detected, this is considered a stripe synchronization error. In the 
mucho implementation, the BIA will send out the following pattern 64 times; 

] K2 I8-.IK2 |k2 H K2 |SH» \«? || « |*~ | K2 ||K2 |Sff|l<a > 2 i)^ j^e |k2 \*T] 

The state field is encoded with destination slot number as well as 1 bit used to tell whether 
you're in the middle of the stripe sequence or whether this is the last K2 transfer alter which 
valid data follows. XPNT will pass this through it as if it were any data being passed. 



7.3 Error Handling 

The implementation will not differentiate between different types of errors described above and will treat 
all as violation error. Pais section will describe error process at each failure point and reasoning behind it. 
There are 4 places where an error is detected. 

1 . XPNT -> BIA. Error is detected on the receiving side of the BIA from one of the XPNT. 
When a link error or a lane error exists in one of the stripes, there exists a case when the 
deteiiniitaiion of source sending the ceil when an error occurred cannot be known since it 
could be between cell boundaries when the error is detected. Further, in the case of lane 
synchronization error, the determination of when a particular stripe loses synchronization 
cannot be known. For this reason, a general procedure is defined to handle link error and lane 
synchronization error . The procedure is as follows: 

a. The BIA will mark all packets from all slots that si has received a portion of Hie packet as 
being aborted. Tins will flush the packet down through the BIA onto die packet 
processor as AOP packet. 

b. It will reset its entire write pointer for syne queues and sync queue read pointers to a 
known value (0). This will insure stripe s>yuchxot:iza«on. 

c. Wait for the error condition to go away. No process wiii be defined to tell source slot 
from sending sync character. 

d. Wait for stripe alignment sequence across each slot Once a sending source sends a stripe 
alignment character then the source slot is marked as in sync. This is needed since during 
an error the 5 stripes would have diffictdtly is knowing which SOP to syne to since 
packets of 40 bytes could exists thus a new packet every clock cycle. One could 
overcome this problem by having all packets tracking a sequence number but this 
requires at least 5-bit sequence- number <tue to latency between stripes within the XPNT. 
The 5 bits don't exist. This is a much more robust and a much simpler solution. 

e. Wait for SQP following fee slot being marked as striped aligned. 

In the case of a stripe alignment error, there is no need to flash ail the slots. Thus only the slot 
with a stripe alignment error is put through the above process, 

2. BIA->XPNT. Error is detected on the receiving side of the XPNT. There are 2 types of 
problem that can exists, link: error and lane synchronization error. At first glance, it seems 
like a simple problem to solve, send an AOP and you're done. However, this poses an 



interesting problem on the receiving BIA in that the 5 independent XPNT's now have 
different number of input to arbitrate from. An example of this is a scenario is when slots 0, 
1, and 2, are transmitting to slot 7 and slot 0's 5* stripe link goes down. Now in the scenario, 
stripe 0 to 4 wi!S arbritate between slots 0, 1, and 2 while the 5* stripe will arbitrate between 
only slots 1, and 2 for destination slot 7, This results in possible temporary overflow of sync 
queue for slotO for stripes 0 to 4 on destination slot 7 and overflow of slot! and slot2 for stripe 
5 for destination slot 7, This problem manifests to all slots if there is enough traffic onto the 
backplane and link is down for long time. The receiving XPNT that detects the error cannot 
know destination slot of the data when an error occurred similar to receiving BIA getting a 
link error or lane error. For this reason, it must flush all it destination queues from the bad 
source slot. The following procedure will be followed by the XPNT in case of an error: 

a. Send an AOP to all slots. 

b. Wait for error to go away. 

c. Sync lo KO token after error goes away to begin accepting data. 

On the receiving BIA side, when an AOP h detected from a particular stripe, then the process 
similar to one of the strips being oat of sync is followed. This could also result in stripe 
synchronisation error on different slots as described with example above in which case stripe 
synchronization procedure for BIA is followed. 

Since the probability of such failures is unknown at. this time, the following OPTIONAL 
signals will also exists that need not be hooked up 03 the board. There will be a set of error 
ready's from each XPNT to BIA that will be wired or on the board and hooked to BL4. The 
motivation for this is when aa eiror is detected by on of die XPNT. the XPNT send scop the 
BIA from sending any more data. This prevents huge amounts of data from being dropped 
when an error exists. It will also prevent sync queues from overflowing on the receive side of 
the BIA. The sending BIA will continuously repeat isne synchronization sequence until an 
error signal is deasserted. After deasscrtion, it will send out stripe synchronization sequence 
to all slots. 'Hie other OPTIONAL set of .ignals that were considered but determined to be 
fatal were flow control from BIA to XPNT based on fee sync queues reaching a certain 
threshold. The motivation behind this was to prevent loss of data due to momerntary 
overflowing of queues when BIA->XPKT failure occurs. This was considered fatal irs that 
there could exists cases where one of the stripes would overflow while one of feem is empty. 
This could result in a hang condition. To prevent this from occurring the flow control 
mechaiiism must consists of flow control based on source slot as well as individual stripes. 
Due to huge amount of pin requirement of XPNT (56}, as well as huge pis requiremeats ea 
BIA (35), if this option is to be considered it must be some sort of a serial backpressure. 

3 . IBT->B I A. Error detected on the receiving side of ttw BIA is treated identical to error 
receiving on the XPNT from the BIA. Since as with other errors described above, since the 
destination slot cannot be know under certain conditions by the BIA, the following process is 
followed: 

a. Send an AOP to ail slots. 

b. W ait. for error to go away. 

c. Sync to KO token after error goes away to begin accepting data. 

4 . BI A->IBT. Error detected on fee receiving side is the IBT is treated identical to error seen by 
fee BIA from IBT. The following process will be used 

a. Send an AOP to all slots of down stream IPO'IGC to terminate any packet in progress. 

b. Wait for error to go away. 

c. Sync to KO token after error goes away to begin accepting data. 



7.4 Reset Procedure 



The following reset procedure will be followed to get the serdes in sync. An external reset will be asserted 
to the serdes core when a reset is applied to the core. The duration of the reset pulse for the serdes need not 
be longer than 10 cycles. After reset pulse, the transmitter and the receiver of the serdes will sync up to 
each other through defined procedure. It is assumed that the serdes will be in sync once the core comes out 
of reset. For this reason, the reset pulse for the core must he considerably greater than the reset pulse for 
the serdes core. 

The core will rely on software interaction to get the core in sync. Once the BIA, EST, and XPNT come out 
of reset, they will continuously send lane synchronization sequence. The receiver will set a software 
visible bit stating that it's lane is in sync. Once software determines that the lanes are in sync, it will try to 
get the stripes in sync. This is done through software which will enable continuously sending of stripe 
synchronization sequence. Once again, the receiving side of the BIA will set a bit staring that it's in sync 
with a particular source slot. Once software determines this, it will enable transmit for the BIA, XPNT and 
IBT. 



7.5 Hot Swapping 

Hot Swapping will result in a link error on the XPNT. This will be equivalent to XPNT->BIA link going 
d-^w Tie s-r le u o< c<^ be fo on«*d WLev a oew blade 1 plugged in, it will require a software walk 
hrcicL »3s dtst nbeo n »ne reset procedure * s bi 1 1£ "-he ie>v 'ilade to 1 ft- 



8 IBT to SBIA Encoding Scheme 

lh 1 t _ «i- mi' a,V rceivc- r>ac<ets :o a id fr >rn cKA t.'rouj* 3 <ALl interface. Packets are segmented into 
cells which consist ot a tour bvte header followed by 32 bytes oi data. End of packet is signaled by Kl 
special character on anv invalid data bytes within four byte of transfer or four Kl on all XAU1 lanes. Each 
bvte is serialized onto one XAUI lane. 




j <! p^c a c fo matted u to data cells which cous ts> of u 'iead<* pits a data payload. 32 bit of header 
tJe„> o ie _y~le a tat r XAL I lanes I* lui K0 -pecial cl araUer on LarseO to indicate that current transfer is 
a header. The state inionnation. wiii go on Lane i ot a neaaer. 









IX station s^t au lbei flora IB I to SBIA. IPC oan address 10 slots(7 
rz no e i Ic^j at i (X -an a id. ess 14 slots (7 remove and 7 local) 


State ^ 


Pav.'-w'^e 


En ode payioa j sta'e 

00 - RESERVED 

01 - SOP 
10 - DATA 
11 -ABORT 


State[6] 


Source/Destination 


Encode source/destination IPC id number 




IPC 








0 - to/from IPC0 






3 - to/from IPC1 



State [7] Reserved 



Reserved 



9 Flow Control 

IBT toSBiAllow control: 

3BT flow controls SBIA on a per IPC basis tbrougin two dedicated pins, sds_ixready[l :0]. 
IBT de-asserts sds_jxready[l:0] under two conditions: 

I) IBT interna? FIFO for tPCO/i is filled above it's high water mark. 

2; IPCO/i de -asserts its fap_rxready to IBT 

SBIA to IB T flow contral: 



38L\ Cow controls IBT on a per IPC/iGC per destination slot basis, it serializes 14 ready status to IBT 
through a periodical syne and two txready signals for eaci IPC. IBT decodes it and passes on to IPCO/I in 6 
cycles. IPC and IGC system will have die same clock cycle latencies. 



! sync 


Tx. ready [3:2] 


Tx readvfi:!)' 


DescriBtions 


1 ^ 


IPC1 Slot! 1:0] 


IPC0Slot[l:0] 


SJorfl:0} ready to receive from 
iPC/lGC/C/i 


0 


IPC I Sloi[3:2] 


1PC0 Slot[3:2] 


SIot[3:2] ready to receive from 
IPC/IGC/0/l 


i 0 


IPC! S;ot[5:4] 


IPCO SIot[5:4] 


Slot[5:4j ready to receive from 
TPC/1GG'0/1 




IPC1 Slot[?.6] 


IPCO SIot[7:<j] 


Siot[7:6; ready to receive from 
IPC7IGC/0/1 ' 




iPCl Slot[9:81 


IPCO Slotf9:8] 


Siot[9:8] ready to receive from 
IPC/iGC/0/i 


0 


IPC1 Sloi[1 1:10] 


IPCO Siotii 1:10] 


S!ot[l 1:10] ready to receive from 
IPC/IGC/0/1 


0 IPCS Slot[l?:12] 


IPCO Slot! 13: 12! 


fPC/lGG'0/1 



